A heuristically accelerated reinforcement learning method for maintenance policy of an assembly line

نویسندگان

چکیده

This paper aims to investigate the maintenance policy for a two-machine one-buffer (2M1B) assembly line system. We assume that observed quality states of deteriorating machines in system are characterized by multiple decreasing yield stages. A semi-Markov decision process (SMDP) model is used describing heuristically accelerated multi-agent reinforcement learning (HAMRL) method conducted solve problem model. The asynchronous updating rules introduced HAMRL method, and production time, preventive (PM) time corrective repair (CR) random, deterioration mode device not fixed. Meanwhile, comparison with simulated annealing search (SAS) based exploration algorithm neighborhood (NS) (RL) presented. empirical results indicate proposed can speed up process, has certain advantage larger space more practical problem. And strategy 2M1B obtained under condition convergent average cost rate. provides new insights into application selection techniques

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance

This paper presents a comparative analysis of three Reinforcement Learning algorithms (Q-learning, Q(λ)-learning and QSlearning) and their heuristically-accelerated variants (HAQL, HAQ(λ) and HAQS) where heuristics bias action selection, thus speeding up the learning. The experiments were performed in a simulated robot soccer environment which reproduces the conditions of a real competition lea...

متن کامل

Market-Based Dynamic Task Allocation Using Heuristically Accelerated Reinforcement Learning

This paper presents a Multi-Robot Task Allocation (MRTA) system, implemented on a RoboCup Small Size League team, where robots participate of auctions for the available roles, such as attacker or defender, and use Heuristically Accelerated Reinforcement Learning to evaluate their aptitude to perform these roles, given the situation of the team, in real-time. The performance of the task allocati...

متن کامل

Heuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results

Since finding control policies using Reinforcement Learning (RL) can be very time consuming, in recent years several authors have investigated how to speed up RL algorithms by making improved action selections based on heuristics. In this work we present new theoretical results – convergence and a superior limit for value estimation errors – for the class that encompasses all heuristicsbased al...

متن کامل

Heuristically Accelerated Q-Learning: A New Approach to Speed Up Reinforcement Learning

This work presents a new algorithm, called Heuristically Accelerated Q–Learning (HAQL), that allows the use of heuristics to speed up the well-known Reinforcement Learning algorithm Q–learning. A heuristic functionH that influences the choice of the actions characterizes the HAQL algorithm. The heuristic function is strongly associated with the policy: it indicates that an action must be taken ...

متن کامل

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Industrial and Management Optimization

سال: 2023

ISSN: ['1547-5816', '1553-166X']

DOI: https://doi.org/10.3934/jimo.2022047